Picture for Yixiu Mao

Yixiu Mao

RLVR without Ineffective Samples: Group Prioritized Off-Policy Optimization for LLM Reasoning

Add code
May 31, 2026
Viaarxiv icon

Listwise Policy Optimization: Group-based RLVR as Target-Projection on the LLM Response Simplex

Add code
May 07, 2026
Viaarxiv icon

Dynamics-Predictive Sampling for Active RL Finetuning of Large Reasoning Models

Add code
Mar 11, 2026
Viaarxiv icon

Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Add code
Sep 19, 2025
Viaarxiv icon

Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments

Add code
Apr 27, 2025
Viaarxiv icon

Beyond Any-Shot Adaptation: Predicting Optimization Outcome for Robustness Gains without Extra Pay

Add code
Jan 19, 2025
Figure 1 for Beyond Any-Shot Adaptation: Predicting Optimization Outcome for Robustness Gains without Extra Pay
Figure 2 for Beyond Any-Shot Adaptation: Predicting Optimization Outcome for Robustness Gains without Extra Pay
Figure 3 for Beyond Any-Shot Adaptation: Predicting Optimization Outcome for Robustness Gains without Extra Pay
Figure 4 for Beyond Any-Shot Adaptation: Predicting Optimization Outcome for Robustness Gains without Extra Pay
Viaarxiv icon

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Add code
Dec 15, 2024
Figure 1 for Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Figure 2 for Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Figure 3 for Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Figure 4 for Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning
Viaarxiv icon

Doubly Mild Generalization for Offline Reinforcement Learning

Add code
Nov 13, 2024
Figure 1 for Doubly Mild Generalization for Offline Reinforcement Learning
Figure 2 for Doubly Mild Generalization for Offline Reinforcement Learning
Figure 3 for Doubly Mild Generalization for Offline Reinforcement Learning
Figure 4 for Doubly Mild Generalization for Offline Reinforcement Learning
Viaarxiv icon

Offline Reinforcement Learning with OOD State Correction and OOD Action Suppression

Add code
Oct 28, 2024
Viaarxiv icon